17 research outputs found

    Détection des transferts horizontaux de gÚnes : modÚles et algorithmes appliqués à l'évolution des espÚces et des langues

    Get PDF
    Le transfert horizontal de gĂšnes (THG, ou transfert latĂ©ral de gĂšnes) est un mĂ©canisme d'Ă©volution naturel qui consiste en le transfert direct du matĂ©riel gĂ©nĂ©tique d'une espĂšce Ă  une autre. La possibilitĂ© que le transfert horizontal de gĂšnes puisse jouer un rĂŽle clĂ© dans l'Ă©volution biologique est un changement fondamental dans notre perception des aspects gĂ©nĂ©raux de la biologie Ă©volutive survenu ces derniĂšres annĂ©es. Par exemple, les bactĂ©ries et les virus possĂšdent des mĂ©canismes sophistiquĂ©s d'acquisition de nouveaux gĂšnes par transfert horizontal leur permettant de s'adapter et d'Ă©voluer adĂ©quatement dans leur environnement. Jusqu'Ă  tout rĂ©cemment, les mĂ©thodes de dĂ©tection de ce mĂ©canisme reposaient essentiellement sur l'analyse de sĂ©quences et Ă©taient trĂšs rarement automatisĂ©es. Il est impossible de reprĂ©senter l'Ă©volution d'organismes ayant subi des THG Ă  l'aide d'arbres phylogĂ©nĂ©tiques acycliques. La prĂ©sentation adĂ©quate est celle d'un rĂ©seau. Dans cette thĂšse, nous dĂ©crivons un nouveau modĂšle de ce mĂ©canisme d'Ă©volution, en se basant sur l'Ă©tude de diffĂ©rences topologiques et mĂ©triques entre un arbre d'espĂšces et un arbre du gĂšne infĂ©rĂ©s pour le mĂȘme ensemble d'espĂšces. Les mĂ©thodes qui en dĂ©coulent ont Ă©tĂ© appliquĂ©es Ă  des jeux de donnĂ©es rĂ©elles oĂč des hypothĂšses de transferts latĂ©raux de gĂšnes Ă©taient plausibles. Des simulations MontĂ©-Carlo ont Ă©tĂ© menĂ©es afin d'Ă©valuer la qualitĂ© des rĂ©sultats par rapport Ă  des mĂ©thodes existantes. Nous prĂ©sentons Ă©galement une gĂ©nĂ©ralisation du modĂšle de transferts horizontaux complets qui est applicable pour dĂ©tecter des transferts partiels et identifier des gĂšnes mosaĂŻques. Dans ce dernier modĂšle, on suppose qu'une partie seulement du gĂšne a Ă©tĂ© transfĂ©rĂ©e. Enfin, nous prĂ©sentons une application de ces nouvelles mĂ©thodes servant Ă  modĂ©liser des emprunts de mots survenus durant l'Ă©volution des langues indo-europĂ©ennes. \ud ______________________________________________________________________________ \ud MOTS-CLÉS DE L’AUTEUR : arbre phylogĂ©nĂ©tique, rĂ©seau rĂ©ticulĂ©, transfert horizontal de gĂšnes, critĂšre des moindres carrĂ©s, distance de Robinson et Foulds, dissimilaritĂ© de bipartitions, biolinguistique

    Weighted bootstrapping: a correction method for assessing the robustness of phylogenetic trees

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-parametric bootstrapping is a widely-used statistical procedure for assessing confidence of model parameters based on the empirical distribution of the observed data <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and, as such, it has become a common method for assessing tree confidence in phylogenetics <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. Traditional non-parametric bootstrapping does not weigh each tree inferred from resampled (i.e., pseudo-replicated) sequences. Hence, the <it>quality </it>of these trees is not taken into account when computing bootstrap scores associated with the clades of the original phylogeny. As a consequence, traditionally, the trees with different bootstrap support or those providing a different fit to the corresponding pseudo-replicated sequences (the fit quality can be expressed through the LS, ML or parsimony score) contribute in the same way to the computation of the bootstrap support of the original phylogeny.</p> <p>Results</p> <p>In this article, we discuss the idea of applying weighted bootstrapping to phylogenetic reconstruction by weighting each phylogeny inferred from resampled sequences. Tree weights can be based either on the least-squares (LS) tree estimate or on the average secondary bootstrap score (SBS) associated with each resampled tree. <it>Secondary bootstrapping </it>consists of the estimation of bootstrap scores of the trees inferred from resampled data. The LS and SBS-based bootstrapping procedures were designed to take into account the quality of each "pseudo-replicated" phylogeny in the final tree estimation. A simulation study was carried out to evaluate the performances of the five weighting strategies which are as follows: LS and SBS-based bootstrapping, LS and SBS-based bootstrapping with data normalization and the traditional unweighted bootstrapping.</p> <p>Conclusions</p> <p>The simulations conducted with two real data sets and the five weighting strategies suggest that the SBS-based bootstrapping with the data normalization usually exhibits larger bootstrap scores and a higher robustness compared to the four other competing strategies, including the traditional bootstrapping. The high robustness of the normalized SBS could be particularly useful in situations where observed sequences have been affected by noise or have undergone massive insertion or deletion events. The results provided by the four other strategies were very similar regardless the noise level, thus also demonstrating the stability of the traditional bootstrapping method.</p

    Armadillo 1.1: An Original Workflow Platform for Designing and Conducting Phylogenetic Analysis and Simulations

    Get PDF
    In this paper we introduce Armadillo v1.1, a novel workflow platform dedicated to designing and conducting phylogenetic studies, including comprehensive simulations. A number of important phylogenetic and general bioinformatics tools have been included in the first software release. As Armadillo is an open-source project, it allows scientists to develop their own modules as well as to integrate existing computer applications. Using our workflow platform, different complex phylogenetic tasks can be modeled and presented in a single workflow without any prior knowledge of programming techniques. The first version of Armadillo was successfully used by professors of bioinformatics at UniversitĂ© du Quebec Ă  Montreal during graduate computational biology courses taught in 2010–11. The program and its source code are freely available at: <http://www.bioinfo.uqam.ca/armadillo>

    New efficient algorithm for detection of horizontal gene transfer events

    No full text
    Abstract. This article addresses the problem of detection of horizontal gene transfers (HGT) in evolutionary data. We describe a new method allowing to predict the ways of possible HGT events which may have occurred during the evolution of a group of considered organisms. The proposed method proceeds by establishing differences between topologies of species and gene phylogenetic trees. Then, it uses a least-squares optimization procedure to test the possibility of horizontal gene transfers between any couple of branches of the species tree. In the application section we show how the introduced method can be used to predict eventual transfers of the rubisco rbcL gene in the molecular phylogeny including plastids, cyanobacteria, and proteobacteria

    Detecting genomic regions associated with a disease using variability functions and Adjusted Rand Index

    Get PDF
    Abstract Background The identification of functional regions contained in a given multiple sequence alignment constitutes one of the major challenges of comparative genomics. Several studies have focused on the identification of conserved regions and motifs. However, most of existing methods ignore the relationship between the functional genomic regions and the external evidence associated with the considered group of species (e.g., carcinogenicity of Human Papilloma Virus). In the past, we have proposed a method that takes into account the prior knowledge on an external evidence (e.g., carcinogenicity or invasivity of the considered organisms) and identifies genomic regions related to a specific disease. Results and conclusion We present a new algorithm for detecting genomic regions that may be associated with a disease. Two new variability functions and a bipartition optimization procedure are described. We validate and weigh our results using the Adjusted Rand Index (ARI), and thus assess to what extent the selected regions are related to carcinogenicity, invasivity, or any other species classification, given as input. The predictive power of different hit region detection functions was assessed on synthetic and real data. Our simulation results suggest that there is no a single function that provides the best results in all practical situations (e.g., monophyletic or polyphyletic evolution, and positive or negative selection), and that at least three different functions might be useful. The proposed hit region identification functions that do not benefit from the prior knowledge (i.e., carcinogenicity or invasivity of the involved organisms) can provide equivalent results than the existing functions that take advantage of such a prior knowledge. Using the new algorithm, we examined the Neisseria meningitidis FrpB gene product for invasivity and immunologic activity, and human papilloma virus (HPV) E6 oncoprotein for carcinogenicity, and confirmed some well-known molecular features, including surface exposed loops for N. meningitidis and PDZ domain for HPV.</p

    A fast compound algorithm for mining generators, closed itemsets, and computing links between equivalence classes

    Get PDF
    International audienceIn pattern mining and association rule mining, there is a variety of algo-rithms for mining frequent closed itemsets (FCIs) and frequent generators (FGs), whereas a smaller part further involves the precedence relation between FCIs. The interplay of these three constructs and their joint computation have been studied within the formal concept analysis (FCA) field yet none of the proposed algorithms is scalable. In frequent pattern mining, at least one suite of efficient algorithms has been designed that exploits basically the same ideas and follows the same overall computational schema. Based on an in-depth analysis of the aforementioned inter-play that is rooted in a fundamental duality from hypergraph theory, we propose a new schema that should enable for a more parsimonious computation. We exemplify L. Szathmary (B) L. Szathmary et al. the new schema in the design of Snow-Touch, a concrete FCI/FG/precedence miner that reuses an existing algorithm, Charm, for mining FCIs, and completes it with two original methods for mining FGs and precedence, respectively. The performance of Snow-Touch and of its closest competitor, Charm-L, were experimentally compared using a large variety of datasets. The outcome of the experimental study suggests that our method outperforms Charm-L on dense data while on sparse one the trend is reversed. Furthermore, we demonstrate the usefulness of our method and the new schema through an application to the analysis of a genome dataset. The initial results reported here confirm the capacity of the method to focus on significant associations

    Fast Mining of Iceberg Lattices: A Modular Approach Using Generators

    Get PDF
    Beside its central place in FCA, the task of constructing the concept lattice, i.e., concepts plus Hasse diagram, has attracted some interest within the data mining (DM) field, primarily to support the mining of association rule bases. Yet most FCA algorithms do not pass the scalability test fundamental in DM. We are interested in the iceberg part of the lattice, alias the frequent closed itemsets (FCIs) plus precedence, augmented with the respective generators (FGs) as these provide the starting point for nearly all known bases. Here, we investigate a modular approach that follows a workflow of individual tasks that diverges from what is currently practiced. A straightforward instantiation thereof, Snow-Touch, is presented that combines past contributions of ours, Touch for FCIs/FGs and Snow for precedence. A performance comparison of Snow-Touch to its closest competitor, Charm-L, indicates that in the specific case of dense data, the modularity overhead is offset by the speed gain of the new task order. To demonstrate our method’s usefulness, we report first results of a genome data analysis application
    corecore